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Abstract —The generator matrices of polar codes and Reed- 
Muller codes are obtained by selecting rows from the Kronecker 
product of a lower-triangular binary square matrix. For polar 
codes, the selection is based on the Bhattacharyya parameter of 
the row, which is closely related to the error probability of the 
corresponding input bit under sequential decoding. For Reed- 
Muller codes, the selection is based on the Hamming weight 
of the row. This work investigates the properties of the index 
sets pointing to those rows in the infinite blocklength limit. In 
particular, the Lebesgue measure, the Hausdorff dimension, and 
the self-similarity of these sets will be discussed. It is shown 
that these index sets have several properties that are common to 
fractals. 

Index Terms —Polar codes, Reed-Mnller codes, fractals, self- 
similarity 

I. Introduction 

Polar codes and Reed-Muller codes are Kronecker product- 
based codes. Such a code of block-length 2" is based on the 
n-fold Kronecker product G{n) := f where 


Following the terminology of m, a rate-itr/2" Ki'onecker 
product-based code is uniquely defined by a set E of K 
indices; Its generator matrix is the submatrix of G(n) con¬ 
sisting of the rows indexed by E. For polar codes ||2l, in 
which each row of G{n) can be interpreted as a (partially 
polarized) channel, E consists of rows corresponding to the 
K channels with the lowest Bhattacharyya parameters El (the 
“good” channels, see Section For Reed-Muller codes, E 
consists of those rows of G{n) with a Hamming weight above 
a certain threshold (see Section llVb . Despite its importance 
for code construction, at least for polar codes, very little is 
known about the structure of E. A recent exception is the 
work by Renes, Sutter, and Hassani, stating conditions under 
which polarized sets are aligned, i.e., under which the good 
(bad) channels derived from one binary-input memoryless 
channel are a subset of the good (bad) channels derived from 
another a. 

That Kronecker product-based codes, such as polar 
codes El or Reed-Muller codes, possess a fractal nature has 
been observed in m, noting the similarity between G{n) and 
the Sierpinski triangle. Much earlier, Abbe suspected that the 
set of “good” polarized channels is fractal 0 . Nevertheless, 


to the best of the author’s knowledge, no definite statement 
regarding this fractal nature has been made yet. In this paper, 
we try to fill this gap and present results about the sets E for 
polar codes (Section Hilt and Reed-Muller codes (Section |3l. 
The self-similar structure of these sets is also suggested in 0, 
which shows that polar and Reed-Muller codes are decreasing 
monomial codes. While 0 focuses on finite blocklengths, we 
study the properties of E for infinite blocklengths, i.e., for 
n —>■ oo. 

To simplify analysis, we represent every infinite binary 
sequence indexed in by a point in the unit interval [ 0 , 1 ]. 
Let fl {0,1}°° be the set of infinite binary sequences, and 
let b := (&162 ■ • •) G H be an arbitrary such sequence. We 
abbreviate := ( 61&2 • ■ ■ bn)- Let (H, 58, P) be a probability 
space with *8 the Borel field generated by the cylinder sets 
S'(&") := {w G ri: wi = bi,... ,Wn = ^ 2 } and P a probability 
measure satisfying P(5'(6")) = 1/2”. The following function 
/: H [ 0 , 1 ] converts these sequences to real numbers: 

00 , 

( 2 ) 

n—1 

Letting D := [0,1] fl {p/2": p G Z,n G N} denote the set 
of dyadic rationals in the unit interval, we recognize that / is 
non-injective: 

Example 1. / maps both b = (01111111 •••) and b = 
(10000000 •••) to 0.5. We call the latter binary expansion 
terminating. 

However, as the following lemma shows, / is bijective if 
we exclude the dyadic rationals: 

Lemma 1 (IH Exercises 7-10, p. 80]). Let 58[o4] be the Borel 
a-algebra on [0,1] and let A be the Lebesgue measure. Then, 
the function / in (| 2 ]l satisfies the following properties: 

1 ) f is measurable w.r.t. 58[ 04 ] 

2 ) / is bijective on LI \ /“^(D) 

3) for fl//J gS[o,i], P(/-1(/)) = A(/) 

We believe that the results we prove in the following 
not only improve our understanding of polar and Reed- 
Muller codes: Since its introduction in 2009, the polarization 
technique proposed by Arikan has found its way into areas 
different from polar coding. Haghighatshoar and Abbe showed 
in the context of compression of analog sources that Renyi 
information dimension can be polarized 0, and Abbe and 
Wigderson used polarization for the construction of high-girth 
matrices 0. Recently, Nasser proved that a binary operation 
is polarizing if and only if it is uniformity preserving and 
its inverse is strongly ergodic cni, im. We believe that our 
results might carry over to these areas as well; Section |Vl] 
points to possible extensions. 
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II. Preliminaries for Polar Codes 

We adopt the notation of Let W: {0,1} y be 
a binary-input memoryless channel with output alphabet y, 
capacity 0 < I{W) < 1, and with Bhattacharyya parameter 

Z{W) := ^ y'Wiy\0)W{y\l). (3) 

y&y 

That Z{W) = 0 liW) = 1 and Z{W) = 1 I(W) = 0 
is a direct consequence of E Prop. 1]. We say a channel is 
symmetric if there exists a permutation tt: y ^ y such that 
7r“^ = TT and, for every y G y, IL(y|0) = W{T:{y)\l). 

The heart of Arikan’s polarization technique is that two 
channel uses of W can be combined and split into one use of 
a “worse” channel 

W 2 iyl\ui) := {yi\ui ® U 2 )W {y 2 \u 2 ) (4a) 

U2 

and one use of a “better” channel 

Wiiyl,ui\u2) := {yi\ui ® U2)W{y2\u2) (4b) 

where ui,U 2 G {0,1} and yi,y 2 G y. In essence, the 
combining operation codes two input bits by F in O and 
transmits the coded bits over W via two channel uses, creating 
a vector channel. The splitting operation splits this vector 
channel into the two virtual binary-input memoryless channels 
indicated in dUl. Of these, the better (worse) channel has 
a strictly larger (smaller) capacity than the original channel 
W, i.e., /(W 2 ) < while the sum capac¬ 

ity equals twice the capacity of the original channel, i.e., 
+ I{Wi) = 21 (W) E Prop. 4]. 

The effect of combining and splitting on the channel capac¬ 
ities /(PP 2 ) and admits no closed-form expression; the 

effect on the Bhattacharyya parameter at least admits bounds; 

Lemma 2 (E Prop. 5 & 7]). 

Z(W 2 ) = gi(Z(W)) := Z^(W) < Z(W) (5a) 

Z(W) < Z(W^) < go(Z(W)) := 2Z(W) - Z^(W) (5b) 

with equality if W is a binary erasure channel. 

Channels with larger blocklengths 2", n > 1, can either be 
obtained by direct n-fold combining (using the matrix G{n)) 
and n-fold splitting, or by recursive pairwise combining and 
splitting. For 6" G {0,1}", we obtain 

(, w^: ) ^ ) ( 6 ) 

where and &"1 denote the sequences of zeros and ones 
obtained by appending 0 and 1 to 6", respectively. Note that 
gi and go from Lemma |2] are non-negative and non-decreasing 
functions mapping the unit interval onto itself, hence the 
inequality in (fSbl i is preserved under composition: 

Z{W^:) < PbAZm) ■■= 9bn {9K-. (• ■ ■ 9W {Z{W)) ■■■)) 

(7) 

The channel polarization theorem shows that, with proba¬ 
bility one, after infinitely many combinations and splits, only 
perfect or useless channels remain, i.e., either I{W^) = 1 or 
I(W^) = 0 for & G {0,1}°°. This is made precise in: 


Proposition 1 (E Prop. 10]). With probability one, the limit 
Ry looib) := I{W^) takes values in the set {0,1}.' P(/oo = 
1) = I{W) and P(/oo = 0) = 1 - I{W). 

This immediately gives rise to 

Definition 1 (The Good and the Bad Channels). Let Q denote 
the set of good channels, i.e., 

xGG ^3bG r\x)-. I{Wi,) = l. (8a) 

Let B denote the set of bad channels, i.e., 

xGB^3bG r^{x): I{Wi,)=Q. (8b) 

If the polarization procedure is stopped at a finite block- 
length 2" for n large enough, it can still be shown that 
the vast majority of the resulting 2” channels are either 
almost perfect or almost useless, in the sense that the channel 
capacities are close to one or to zero (or that the corresponding 
Bhattacharyya parameters are close to zero or to one). The idea 
of polar coding is to transmit data only on those channels that 
are almost perfect; n-fold combining and splitting leads to 
2" virtual channels, each corresponding to a row of G(n). 
The channels with high capacity are indicated by F, and 
the generator matrix of the corresponding polar code is the 
submatrix of G{n) consisting of those indicated rows. If the 
blocklength grows to infinity (n — oo), the set F becomes 
equivalent to the set Q in Definition [T] 

The difficulty of polar coding lies in code construction, i.e., 
in determining which channels/row indices are in the sets F 
and Q for finite and infinite blocklengths. This immediately 
translates to the question which sequences b G {0,1}°° corre¬ 
spond to combinations and splits leading to a perfect channel 
(or which finite-length sequences &" lead to channels with 
capacity sufficiently close to one). Determining the capacity 
of the virtual channels is an inherently difficult operation, 
since, whenever W is not a binary erasure channel (BEC), 
the cardinality of the output alphabet increases exponentially 
in 2" ESI Ch. 3.3], |[I3 P- 36]. To circumvent this problem, 
Tal and Vardy presented an approximate construction method 
in m, that relies on reduced output alphabet channels that 
are either upgraded or degraded w.r.t. the channel of interest. 
As these upgrading/degrading properties - mentioned earlier 
in Korada’s PhD thesis 113] Def. 1.7 & Lem. 1.8] - play a 
fundamental role in this work, we present 

Definition 2 (Channel Up- and Degrading). A channel 
W~: {0,1} Z is degraded w.r.t. the channel W (short; 
W~ =4 yy) if there exists a channel Q: y ^ Z such that 

W~{z\u) = ^W{y\u)Q{z\y). (9) 

y&y 

A channel 1U+: {0, 1} ^ Z is upgraded w.r.t. the channel W 
(short: )= W) if there exists a channel P: Z ^ y such 

that 

W{y\u) = Y,W+iz\u)Piy\z). (10) 

z^Z 

Moreover, 'tpW il and only if tU ^ 1U+. 

The upgraded (degraded) approximation remains upgraded 
(degraded) during combining and splitting; 
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Lemma 3 (113] Lem. 4.7] & |[T4| Lem. 3]). Assume that 
IL- ^ VL ^ 1L+. Then, 


I{W-) < I{W) < I{W+) 

(11a) 

Z{W-) > Z{W) > Z{W+) 

(11b) 

Wi 4(W+y2 

(11c) 

(w-)i^ Wi ^(W+)°2. 

(lid) 


It can be shown that the better channel (l4bl i obtained from 
combining and splitting is upgraded w.r.t. the original channel 
(as already mentioned in ifT^ p. 9]). The worse channel (l4al l 
is degraded at least if W is symmetric. 

Lemma 4 p. 9] & gj Lem. 3]). W ^ Wi- If W is 
symmetric, then W 2 fL ^ Wf 

Proof: By choosing 

one can show that W =4 To show that also ^ ^ for 
symmetric channels, take El Lem. 3] 

(^W{y 2 \ 0 ) ifyi = y 

Q (y i\y) = l^W{y 2 \ 1 ) ifyi=7r(y). (13) 

[ 0 else 

■ 

Example 2. For a BEC W with erasure probability e, W 2 is a 
BEC with erasure probability and W 2 is a BEC with erasure 
probability 2e — ||2| Prop. 6]. The channel W 2 is an upgrade 
of W, because it can be degraded to W by appending a BEC 
with erasure probability e/(l+e). The channel is degraded 
w.r.t. W by appending a BEC with erasure probability e. 

III. Properties oe the Sets Q and B 
In this section we develop the properties of the sets of good 
and bad channels. 

Proposition 2. For almost all x, there exists a value 0 < 
< 1 such that Z{W) < Hlx) implies x€Q.IfW is a 
BEC, then additionally Z{W) > Dix) implies x G B. 

Proof: See Appendix lAl ■ 

If W is not a BEC, it may happen that Z{W) > 'd{f{b)) 
while still I{W^) — 1. This leads to the question whether 
the set of good channels is (almost surely) increasing with 
decreasing Bhattacharyya parameter, i.e., if the sets of good 
channels for W and W' with Z{W) > Z{W') are aligned. 
While in general the answer is negative ID, Proposition |2| 
answers it positively if PL is a BEC: The set of good channels 
for a BEC is also good for any binary-input memoryless 
channel with a smaller Bhattacharyya parameter US. 

Example 3. Eor x S D, 'd{x) = 1: If Z{W) < 1, i.e., 
if the channel is not completely useless a priori, the non¬ 
terminating expansion of x will make it a perfect channel 
(cf. Proposition |4]i. 

In Appendix iBl we prove that the thresholds of Proposition |2| 
are symmetric: 


Proposition 3. For those x ^ D/or which i?(x) exists, i?(l — 
x) = 1 — D^x). 

The case x G Q \ D is interesting. In this case, the binary 
expansion is unique and recurring, i.e., there is a length-fc 
sequence G {0,1}^, such that fQf'a^a^a^ • • •) = x for 

some 6" G {0,1}". It is straightforward to show that for 
every non-trivial sequence ak (i.e., Ofe contains zeros and 
ones), pjjfc is from [0,1] to [0,1], non-negative, and non¬ 
decreasing, with vanishing derivatives at 0 and 1. Since this 
ensures that p^k (z) < z for z close to zero and (z) > z 
for z close to one, the operation Zi+i = p,^k(zi) constitutes an 
iterated function system with attracting fixed points at z = 0 
and z = 1. Note further that, since p^k corresponds to the 
recurring part of the binary expansion of x, Z{W^‘^ “ ■) 

will be bounded from above by the value to which this 
iterated function system converges after being initialized with 
Z{W 2 n). To show that Proposition holds for x G Q\D 
requires showing that p^^k intersects the identity function only 
once on (0,1), i.e., that there is no attracting hxed point on this 
open interval. We leave this problem for future investigation. 

Example 4. Let x = 2/3, hence /“^(x) = 101010101 •••. 
It suffices to consider one period of the recurring sequence 
and determine its hxed points. In this case we get pio(^) = 
2z^ — z^. Its hxed points are the roots of pi[j{z) — z; removing 
the trivial roots at z = 0 and z = 1 leaves two further roots 
at (±-\/5 — l)/2. One of these roots lies outside [0,1] and is 
hence irrelevant. The remaining root determines the threshold, 
i9(2/3) = (^5 - l)/2. 

Let W he a BEC with erasure probability e = Z(W) = 
i?(2/3). Since e = i?(2/3) is a hxed point of the iterated func¬ 
tion system corresponding to the recurring binary expansion, 
one gets Z{wio = e ^ {0,1}. This example illustrates 

that Proposition [T] holds only almost surely. 

Proposition 4. n S = D. 

Proof: See Appendix O ■ 

That the intersection of the sets of good and bad channels 
is non-empty is a direct consequence of the non-injectivity of 
/. Note further that this intersection cannot be larger, since 
D is the only set to which / maps non-injectively. Since D, 
a common subset of Q and B, is dense in [0,1], both the set 
of good channels and the set of bad channels are dense in the 
unit interval. But even if dyadic rationals are excluded, results 
about denseness can be proved: 

Proposition 5. (/ \ D is dense in [0,1]. IfW is a BEC, then 
also S \ D is dense in [0,1]. 

Proof: See Appendix iDl ■ 

The proposition states that, at least for the BEC, there is no 
interval which contains only good channels. Hence, given a 
specihc channel , it is not possible to assume that a well- 
specihed subset of channels (e.g., all “ for a starting with 
1) generated from this channel by combining and splitting 
will be perfect. The construction algorithm for an infinite- 
blocklength, vanishing-error polar code hence cannot stop at 
a finite blocklength. This is in contrast with finite-blocklength 
polar codes, for which an approximate construction technique 
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Fig. 1. The polar fractal for a BEC. The center plot shows the thresholds i?(a;) for x £ [0,1], while the bottom and the top plots show these thresholds 
for the scaled and shifted sets [0, 0.5] and [0.5,1], respectively. Hence, the thresholds in the top plot are larger than the thresholds in the center plot, which 
are larger than those in the bottom plot. The indicator function of Q is obtained by setting each value in the plot to one (zero) if the erasure probability e is 
smaller (larger) than the threshold. Note further that the figure illustrates the symmetry of t?(x) mentioned in Proposition!^ 


suggests to stop polarizing some channels at already shorter 
blocklengths ifThl . 

Proposition 6. Q is Lebesgue measurable and has Lebesgue 
measure X{G) = I(W). B is Lebesgue measurable and has 
Lebesgue measure X{B) = 1 — I{W). 

Proof: See Appendix |E] ■ 

Note that X{QUB) = 1 although QUB C [0,1]. The reason 
is that convergence to good or bad channels is only almost 
sure, i.e., there may be channels that are neither good 
nor bad (see Example |4|i. 

An immediate consequence of Proposition is that Q and B 
have a Hausdorff dimension equal to one. This follows from 
the fact that the one-dimensional Hausdorff measure of a set 
equals its Lebesgue measure up to a constant uni eq. (3.4), 
p. 45]. Since, thus, the one-dimensional Hausdorff measures 
of Q and B are positive and finite, we have 

Corollary 1. The Hausdorff dimensions of Q and B satisfy 
dig) = 1 and d{B) = 1. 

Also the box-counting dimensions ini p. 28] are equal 
to one, since both sets are dense on the unit interval ini 


Prop. 2.6]. 

We finally come to the claim that polar codes are fractal. 
Following Falconer’s definition Elp. xxviii], a set is fractal 
if it is (at least approximately) self-similar and has detail on 
arbitrarily small scales, or if its fractal dimension (e.g., its 
Hausdorff dimension) is larger than its topological dimension. 
Whether or not the result shown below will convince the 
reader of this property is a mere question of definition; strictly 
speaking, we can show only quasi self-similarity of Q: 

Proposition 7. Let C/„(fc) := Q r\ [{k — 1)2“", A:2“"] for k = 
1 ,...,2 ”. g — f/o(l) is quasi self-similar in the sense that, 
for all n and all k, = C/n,_|_i(2fc — 1) U C/„_|_i(2A:) is 

quasi self-similar to its right half: 

e„(fc)c20„+i(2fc)-fc2-" (14) 

IfW is symmetric, is quasi self-similar: 

2g„+ii2k -l)-ik- 1)2-" C g^k) C 2a„+i(2fc) - ^2"" 

(15) 

Proof: See Appendix 10 ■ 

In other words, at least for a symmetric channel, g is 
composed of two similar copies of itself (see Fig. [B. The 
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self-similarity is closely related to the fact that polar codes 
are decreasing monomial codes E Thm. 1]. Along the same 
lines, the quasi self-similarity of B can be shown. 

Example 5. By careful computations we obtain '!?(l/6) ~ 
0.214, i9(l/3) Ri 0.382, and t?(2/3) Ri 0.618. Indeed, if we 
consider 1/3 in Q, then 1/6 and 2/3 are the cotTesponding 
values in t/i(l) and Gi(2). Since t?(l/6) < 19(1/3) < i9(2/3), 
for the BEC we have the inclusion indicated in Proposition |7] 


IV. Preliminaries for Reed-Muller Codes 

As mentioned above, a rate-A'/2" Reed-Muller code has a 
AT X 2" generator matrix with all K rows having a Hamming 
weight larger than a predefined threshold. To make this more 
precise, let ir;(&") = Hamming weight of 5" G 

{0,1}" and let Si{n) be the i-th row of G{n). The generator 
matrix Grm (r, n) of an order-r, length-2" Reed-Muller code 
consists of the rows of G(n) indicated in lO 


= {i G {1,..., 2"}: w{s,{n)) > 2"-"}. (16) 


Trivially, GRuin^n) = G{n), while GRM{0,n) is a single 
row vector containing only ones (length-2" repetition code). 
To analyze the effect of doubling the block length, note that 


G{n + 1) 


G{n) 0 
G{n) Gin) 


(17) 


Assume that we indicate the rows of G'(n) by a sequence 
of binary numbers, i.e., let the i-th row be indexed by 
/i„(6") := 2"^/^^&i2“/ Furthermore, let 06" and 16" 
denote the sequences of zeros and ones obtained be prepending 
0 and 1 to 6", respectively. Clearly, 6„+i(06") = /i„(6") and 
6„+i(16") = 6,„(6") + 2". Combining this with (fTTl l yields 


wish„+i{ob’^)in + 1)) = w{sh„{b^)in)) (18) 

wish^+i{ib^)in + 1)) = 2wisi,„(b'-)in)). (19) 

Defining G(0) := 1, we thus get 

= 2“^"") (20) 

and 

T = hn ({6" G {0,1}": > 2"-"}) . (21) 

Letting the blocklengths go to infinity, we may ask questions 
about the following set: 

Definition 3 (The Heavy Channels). Let TLip) denote the set 
of p-heavy channels, i.e., 

X G nip) 36 G f~^ix): liminf > 1. (22) 

n—¥oo 

Loosely speaking, the set of heavy channels corresponds to 
those rows of Gin), which asymptotically have a Hamming 
weight larger than a given threshold. 

Example 6. 7f(l) = {1}. This follows from the fact that 1 is 
the only number in the unit interval with a binary expansion 
consisting only of ones. ’H(O) = [0,1]. This follows from the 
fact that ii;(6") > 0. 

The results we will show for the set nip) are tightly linked 
to the concept of normal numbers. 


Definition 4 (Normal Numbers). A number x G [0,1] is called 
simply normal to base 2 (x G Af) iff 


3bGr\x): 


lim ■ 

n—^oc 


7(6") 


(23) 


In general, a number is simply normal in base M if the 
number of each of its digits used in its M-ary expansion is 
1/M. A number is called normal if this property not only 
holds for digits, but for subsequences: a number is normal 
in base M if, for each k > 1, the number of each of its 
length-fc sequences used in its M-ary expansion is 1/M^. It 
immediately follows that a normal number is simply normal. 
The converse is in general not true: 


Example 7. Let a; = 1/3, hence 6 = 010101 ■ • •. x is simply 
normal to base 2, but not normal (since the sequences 00 and 
11 never occur). Let x = 1/7, hence 6 = 001001001 ■ ■■. x 
is neither normal nor simply normal. Let x G D, hence 6 is 
either terminating (lim„^.oo w(6")/n = 0) or non-terminating 
(lim„_j.oo ii;(6")/n = 1). Dyadic rationals are not simply 
normal. 


Lemma 5 (Borel’s Law of Large Numbers, cf. d Cor. 8.1, 
p. 70]). Almost all numbers in [0,1] are simply normal, i.e., 

A(A/') = 1. (24) 

Although normal numbers are, in this sense, normal, there 
are uncountably many numbers in the unit interval which are 
not normal. Moreover, the set of numbers that are not normal 
is superfractal, i.e., it has a Hausdorff dimension equal to one 
although it has zero Lebesgue measure QSl- 


V. Properties of the Set 77 

We can show in Appendix iGl that the dyadic rationals are 
not only good and bad, but also heavy: 

Proposition 8. For all p G [0,1), D c 77(p). 

It follows that 77(p) is dense in [0,1] for all p G [0,1). 

The Lebesgue measure of the set of good channels was 
equal to the channel capacity of W. The result for heavy 
channels is inherently different, because 77 (p) does not depend 
on W. The proof of the following result can be found in 
Appendix |H] 

Proposition 9. 77 (p) is Lebesgue measurable and has 
Lebesgue measure 

= {/ 

The result is surprising since it suggests a phase transition 
for the rate of Reed-Muller codes: If p < 1/2, the infinite- 
blocklength Reed-Muller code consists of almost all (in the 
sense of Lebesgue measure) possible binary sequences. In 
contrast, if p > 1/2, the infinite-blocklength Reed-Muller 
code consists of almost no code words (again, in the sense of 
Lebesgue measure). The picture is not as simple if one also 
considers the Hausdorff dimension of 77 (p). In Appendix |I] we 
prove that 77 (p) has positive Hausdorff dimension even if it 
is a Lebesgue null set. 


i/p<l/2 


ifp> 1/2 
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Proposition 10. The Hausdorff dimension satisfies 


d{n{p)) <, . . . ^ 

[>h2{p), if p> 1/2 

where h 2 {x) := —x log 2 x — {1 — x) log 2 (l — a;). 


(26) 


Unfortunately, we were not able to give an exact expression 
for the Hausdorff dimension of T-L{p) for p > 1/2. While the 
set of all non-normal numbers is superfractal, we are not sure 
if this holds also for a proper subset. 

The sets Q and B exhibit self-similarity, i.e., detailed struc¬ 
ture on every scale (cf. Fig. [TJ. We next show that also 'H(p) 
is self-similar. At least for 'H(O) and 'H(l) (cf. Example 
this is as trivial as the self-similarity of a point or a line. For 
p £ ( 0 , 1 ) this self-similarity is more interesting, and related 
to the fact that Reed-Muller codes are decreasing monomial 
codes [| 6 ] Prop. 2]. In Appendix |J] we prove 

Proposition 11. Let H„(p, k) ■= n{p) n [{k - 1)2"’^, fc2-"] 
for k — 1 ,..., 2 ". TLip) = 'Ho(p, 1 ) A quasi self-similar in 
the sense that, for all n and all k, Unip, k) = 'Hn +i{p,2k- 
1 ) U'Hn+i{p,2.k) is quasi self-similar: 


2 'H^+i{p,2k - 1) - {k - 1)2-^ C. 

Jinip, k) c 2'Hn+M 2fc) - k2-^- (27) 


VI. Discussion & Outlook 

That polar codes satisfy fractal properties has long been 
suspected; Every nontrivial, partly polarized channel lU|n 
gives rise, by further polarization, to both perfect and useless 
channels, regardless how close I{W 2 n) is to zero or one. 
This fact is reflected in our Propositions 0] and |5] which state 
that the good channels are dense in the unit interval (and so 
are the bad channels for BECs): A partial polarization with 
sequence 6 " corresponds to an interval with dyadic endpoints, 
and denseness implies that in this interval there will be both 
perfect and useless channels. Proposition |7] claiming the self¬ 
similarity of the sets of good and bad channels, goes one step 
further and gives these sets structure; If a channel polarized 
according to the sequence is good, then so is the channel 
polarized according to If’la. 

An obvious extension of our work should deal with the 
fractal properties of non-binary polar and Reed-Muller codes. 
For example, if g is a prime number, then every invertible ixi 
matrix with entries from {0 ,... ,q — 1 } is polarizing, unless 
it is upper-triangular ifT^ Thm. 5.2]. The n-fold Kronecker 
product of one of these matrices generates channels. It is 
easy to design a function mapping { 0 , 1 }°° to [ 0 , 1 ] 

(cf. (| 2 ]l), admitting an analysis similar to the one presented in 
this paper. Along the same lines, it would be interesting to 
examine the properties of q-ary Reed-Muller codes, e.g., Il20l . 

m- 

Whether binary or not, it is presently not clear how our 
infinite-blocklength results can be carried over to practically 
relevant finite-length codes. Future work shall investigate this 
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Appendix A 

Prooe of Proposition!!] 

Recall that, by Lemma |2| we have 

Z{W^:) < PbAZiW)) ;= --gbAzm) •••))• 


Lemma 6 ( 11221 Lem. 11]). For ¥-almost every realization 
b G LI, there exists a point 9(b) £ [0,1], such that 


lim pf,n(z) 

n—foo 


0 , zG[o,eib)) 

1 , z£( 0 ( 6 ), 1 ]' 


(28) 


Furthermore, the thus constructed RV 9 is uniformly dis¬ 
tributed on [ 0 , 1 ]. 


If Z(W) < 9(b), Z(WAf < \imr,^ooPbAZ(W)) = 0, and 
hence f(b) £ Q. We now define ^(/(b)) := 9(b) if f(b) ^ D 
and i5(/(&)) = 1 if f(b) £ D (since D C by Proposition |4]i. 

Proof for BECs: If PF is a BEC, then Z(wA^) = 
Phr^(Z(W)). Hence, by Lemma| 6 | if e < 9(b), then Z(WA/) = 
lim„^.oo Pbn (e) = 0, and if e > 9(b), then Z(WA/) = 
lim„^.ooP&"(e) = 1 - ■ 


Appendix B 

Proof of Proposition]!] 

Let b £ f~^ ([0,1] \ D) C {0,1}°°, and let b be such that 
bi = 1 — bi for all i. It follows from the linearity of / that 
f(b) -f f(b) = f(b -f 6) = 1, because b -\-b = 11111 • • •. 
Hence, if x ^ D has binary expansion b, then 1 — x has binary 
expansion b. It can be easily verified that gi(l — z) = 1 — 
gi-i(z) for i = 0,1. Hence, 

Pfen (z) = gb„ (• • • 5 f ,2 (pbi (z))---)) 

= 9b,, i9b„-i (• • ■ 5b2 (1 - ffbi (1 - ^)) • • •)) 

= gb„ (Sb„_i (■■■!-g-s^ (1 - 2)) • • •)) 

= 1 - 5b„ (■ • • ffb, (iTbi (1 - ^)) • • •)) 

= 1 -Pb"(l - A- 

If 0 < z < 9(b), then 1 — 9(b) < 1 — z < 1. Since 0 < 
2 < 9(b) implies pb^(z) —>■ 0 and p^^(l — z) ^ 1, we get 
9(b) = l—9(b), and hence '9(1—x) = l—'9(x). This completes 
the proof. ■ 


Appendix C 

Proof of Proposition]!] 

That Q n B C D follows from the fact that only dyadic 
rationals have a non-unique binary expansion. In particular, 
the preimage of every a; £ D consists of two elements, namely 

(6”"i6„0000000---) (29a) 


issue. 


and 


(6""^5„lllllll---) 


(29b) 


7 


where bn = 1 — 6„. By the properties of combining and 
splitting, 

0 < < 1 . (30) 

We hrst show that (I29bl l leads to a good channel. To this 
end, observe that, by 12 Prop. 7], the Bh^ttacharyya parameter 
satishes 0 < Z < 1. Iterating the 
squaring operation drives the Bhattacharyya parameter to zero, 
i.e., = 0, hence = 

1 and D G C/. 

To show that (I29al i leads to a bad channel, assume that 
= S. We now show that for every a € 
fl, /(pp^ooooooo---^ < I(W^°‘). For example, take a = 
(1011100 • • •). By Lemmas |3] (a) and |4] (&), the following 
list of relations can be shown: 

(a) 

'*^00 ^ ^ oo 

„ (b) 

0 J if'' 

if''"" if'’”^°^ 

if''”" IF'’”^°“ 

IF'’”° *5^ if'’”^°“^ 

r, C&) 

l\/b^OO S y^b'^lOlllO 

„ (f») 

^b”000 J pr/b"1011100 

... ^ ... 

and hence, iF^ooooooo- - ^ W^“. By Lemma |2 <5 = 
^ for every a G fl, hence also 

5< inf/(IF^^). (31) 

But since 0 < < 1, by Proposition [T] there must be 

sequences a such that I{W^ “) = 0, hence ^ = 0 and D G 13. 


Appendix D 

Proof of Proposition^ 

The proof follows from showing that between every dyadic 
rational we can hnd a rational a: G Q \ D such that a; G (/. To 
this end, hx xi = p/2"' and X 2 = (p+l)/2". Let further 6” be 
the terminating binary expansion of xi, i.e., /(5”000- • •) = 
xi- Let be such that ai = ■ ■ ■ = ak-i = 1 and ak = 0. 
Note that x := f{b'^a^a^a^---) G (a;i,a; 2 ). We now bound 
the polynomial p^k from above: 

Pak{z) = 2z^ — z^ <2z^ 

The bound crosses z at z = 0 and at z* = 2“^/*^^ 

From this follows that Pak(z) < z for z < z*, where z* 
can be made arbitrarily close to one for k sufficiently large. 
Hence, if Zi+i = Pc,k{zi), then Zi —>■ 0 if zq < z*. Let zq = 
Z{W 2 n) and let k be sufficiently large such that z* > zq. 
Then, Z(IF^"“"“‘’-) = 0 and a; G ^1. 

Proof for BECs: It remains to show that also ,B \ D is 
dense in [0,1]. To this end, we consider the sequence such 


that Cl = • • • = Ofe-i = 0 and ak = 1. We now bound the 
polynomial p^k from below: 

PaW = (l - (1 - zf’’ ') 

= l- 2 (l-^)^'“-'+(l-z)^"-' 

> l- 2 (l-z)^ 

The bound crosses z at at z = 1 and z* = 1 — 2“^/^^ 

From this follows that (z) > z for z > z*, where z* 
can be made arbitrarily close to zero for k sufficiently large. 
Hence, if Zi+i = Pa_k{zi), then z^ —>• 1 if zq > -z*- Let zq = 
Z{W 2 n ) and let k be sufficiently large such that z* < zq. 
Then, Z(IF<^"“"“''-) = 1 and a; G 13. ■ 

Appendix E 

Proof of Proposition[6] 

In the proof we use Lemma[T] Note that \{G) = A(C/\D) + 
A(D) = \{G \ D), and similarly, \{B) = A(S\D). Since / is 
bijective on H' := H \ /“^(D), Dehnition [T] implies 

a; G = 1 (32) 

or f~^{G \ ID) = {h G H': I{W^) = 1}. Note further that 
P(/“^(ID)) = 0. Hence, by Proposition [1] 

X{G) = KG \ D) = P({& G H': I{Wl) = 1}) 

= P(/^ = 1) = /(IF). (33) 

The proof for the set of bad channels follows along the same 
lines. ■ 

Appendix F 

Proof of Proposition!?] 

Since the dyadic nationals are self-similar and since, by 
Proposition |4| ID C (/, one has, for all n and k, 

Gn{k)nI} = 2{Gn+i{2k)nB)-k2-^. (34) 

We now treat those values in [0,1] that are not dyadic 
nationals. If b^ — &162 • • • &« is the terminating binary ex¬ 
pansion of (fc — 1 ) 2 “", every value in [{k — l) 2 “",fc 2 “"] 
has a binary expansion b'^a for some a G H, where &„ = 1 
if and only if {k — 1) is odd. Similarly, and since {2k — 1) 
is always odd, every value in \{2k — 1 ) 2 “"“^, fc 2 “"] has a 
binary expansion h^la' for some a' G H. Assume that a' = a. 
Then, by Lemmas [2 and |4| IFm“ ^ IF,^^“ for all a. Hence, 
if fib^a) G Gn{k), then /(&^la) G Gn+i{‘2k). It remains to 
show that 2 /( 6 ^ 1 a) - /( 6 ^+i) = f{b'^a): 

f{bla) + f{bl^,) = f{bl) + 2-"/(a) + f{bl^,) 

= {k- 1)2-" + 2-"/(a) + A:2-" 

= ( 2 fc-l) 2 -" + 2 -"/(a) 

= 2{2k - 1)2-"-^ + 2 • 2-"-V(a) 

= 2/(6^1) + 2.2-"-V(a) 

= Vihkla) 

Proof for Symmetric Channels: Since {2k — 2) is al¬ 
ways even, every value in [(2A: — 2)2“"-^, {2k — 1)2“"-^] 




has a binary expansion b'^Oa for some a € fl. Then, by 
Lemmas [3 and @1 ^ for all a. Hence, if 

/(6^0a) e C/„+i(2fc), then f{b^a) e Gn{k). It remains to 
show that 2 f(b^0a) - /( 6 ^) = /( 6 ^a): 


/(&» + f{bk) = fm + 2-V{a) + f{bl) 

= {k- 1 ) 2 -” + 2 -"/(a) + {k- 1 ) 2 -” 
= ( 2 fc- 2 ) 2 -” + 2 -”/(a) 

= 2 {2k - 2 ) 2 -”-^ + 2 • 2 -”-V(a) 

= 2/(&”0) + 2.2-”-V(a) 

= 2 /(&^ 0 a) 


Appendix G 

Proof of Proposition[8] 

We take the non-terminating expansion of a; S ID), i.e., 
there is a 6 ^ G {0,1}^ such that /(&^1111 • • •) = x. Hence, 
w{b^) > n — k for n > k. In Definition [3 we can take the 
binary logarithm on both sides of the inequality to get the 
condition 

X G 'H{p) O 35 G f~^{x): liminf w(5”) — np > 0. (35) 

n—¥oo 

But 

liminf w{b^) — np = lim n(l — p) — k (36) 

n—foo n—^oo 

goes to infinity for p < 1 . ■ 

Appendix H 

Proof of Proposition^ 

By Example |7J dyadic rationals are not simply normal, 
hence let Af C [0,1]\ID be the set of simply normal numbers in 
[0,1]. Note that / is bijective on Af by Lemma[T] By Lemma|5] 
we have 

V5 G/-^(AA): ?u(5”) = in-I-o(n). (37) 

Fix p. Then, 

liminf w{b'^) — np = lim n ( -p ) + o{n). (38) 

n—^oo n—^oo y 2 / 

If p < 1 / 2 , then this limit diverges to infinity, and hence 
Af C 'H(p). Thus, since X{Af) = 1, we have \{H{p)) = 1. 
If p > 1/2, the limit diverges to minus infinity, and hence 
AA 'H{p). Thus, 'H(p) C [0,1] \ AA, from which follows 
A(7f (p)) = 0. 

Now let p = 1/2. We define a random variable B on our 
probability space, such that for all 5 G H, B{b) = b. B is a 
sequence of independent, identically distributed Bernoulli-1/2 
random variables, i.e., for all i we have P(i?i = 1) = F{Bi = 
0) = 1/2. We have 

X{n{l/2)) = P fliminf ri;(B”) - ^ > oV (39) 
V n—^co 2 / 

Consider the simple random walk Sn ■= w{B^) — Let 
No{n) be the number of zero crossings of the sequence 
Si,..., Sn, and let NQ{n,b) be the number of zero cross¬ 
ings corresponding to the realization 5 G H. The event 


liminf„_>oo w(5”) — § > 0 can only happen if the realization 
of Sn corresponding to 5 has only finitely many zero crossings, 
i.e., 

T) 

{b G H: liminfw(5”) - - > 0} 

n—>-oo 2 

C {5 G 11: 3i? G Nq: lim NqIu, b) < R} 

n—¥oo 

oo 

= I J {5 G H: lim No{n, b) < i?} 

n—foo 

CXD 

= I J liminf{5G H: NQ{n,b) < R} 

N-Z n —>00 
R=0 

and hence 

pf{5Gll: liminf r(;(5”)— — > 0}) 

\ n—¥oo 2 / 

00 

< P(liminf{5 G H: No{n, b) < R}) 

* ^ n—¥oo 

R^O 

00 

< lim P(Wo(?T-) < R) (40) 

^—z n—lcsD 
R=0 

where the second inequality is due to Patou’s lemma m 
Lem. 1.28, p. 23]. 

With ||2l Ch. IIL5, p. 84] 

nNoin) =R) = 2FiS2n+i = 2R+1) (41) 


we get 

R 

P(A^o(n) < R) =2Y,nS2n+i = 2 r + 1 ) 

r—0 
R 


r—0 


2,71 “h 1 
n — r 


< 2 “ 


■E 

r=0 


= 2 -^”(i?-f 1 ) 


2 - 2 ra-l 

2 n + 2 ^ 
n + 1 
2 n 


(b) 

< 2 


n -I- 1 
I)e 22 ”+A 

4e(i?-f 1) 


\/(n -f l)7r 


y^{n + l)7r 

where (a) is ll24] eq. (2.2), p. 75] and (5) is due to Stirling’s 
approximation ll^ eq. 6.1.38, p. 257]. Since for n ^ 00 this 
probability is zero, we have by (EOj) 

X{n{l/2)) = P (liminf w(B”) - ^ > o) = 0. (42) 

V n—¥oo 2 / 

This completes the proof. ■ 

Appendix I 

Proof of Proposition [Tol 

That d{'H{p)) = 1 for p < 1/2 follows from Proposition 0 
in combination with Corollary [T] For p> 1/2, we define 


AAp := < X G [0,1]: 35 G / ^(x): lim 


r(;(5” 


= P 


(43) 
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for some p G (0,1). Note that A/ 1/2 = A/”. By ll26l (cf. ifTSl 
Chapter 8] for further notes), the Hausdorff dimension of this 
set is given b>Q 

d{Afp) = h2{p). (44) 

Reasoning as in the proof of Proposition |9l Afp C T-L{p) if 
p > p and AAp <^'H{p) if p < p. As a consequence, 

00 

y Mp+i/^ C n{p). (45) 

n— 1 

For a countable sequence of sets A„, Hausdorff dimension 
satisfies El p. 49] 


d{ y An) = sup d{An), (46) 

n=l 

and hence, by the monotonicity of Hausdorff dimension E 
p. 48], 

d{T-L{p)) > sup h2{p + f /n) = h2{p) (47) 

n>l 

where the last equality follows from the fact that the binary 
entropy function decreases with increasing p for p> 1/2. In 
particular, for p = 1/2, d{'H{p)) = 1. This completes the 
proof. ■ 


Appendix J 

Proof of Proposition [TT] 


The proof follows along the lines of the proof of Propo¬ 
sition I 2 ] Let again be the terminating expansion of (fc — 
1)2“", and let a G H. The connections between the sequences 
b := h^a, b- := b'^Oa, and 6+ b^la have been established 
above. To prove the theorem, we have to show that 


lim inf tc (&!!*) — pm > 0 

m—¥oc 

=> liminf w{b^) — pm > 0 

m—^OQ 

=> liminf w[b'A) — pm > 0. 

m—^oo 


(48a) 

(48b) 

(48c) 


This is obtained by 


liminf w{b^) — pm 

m—^oo 

= w{b^Q) — p{n + 1 ) + liminf w[a^) — pm 

m—^oo 

= w{b^) — pn — p + liminf w{a^) — pm 

m—¥oo 

< w{b^) — pn + liminf w(a’^) — pm 

m^oo 

< — pn+ {1 — p) + liminf w(a"^) — pm 

m—^oo 

= w{b^l) — p{n + 1 ) + liminf w{aJ^) — pm 


(49a) 


(49b) 


where (149 al) equals (I48bl l and where (I49bl l equals (I48cl) . The 
inequalities yield the desired result. ■ 


'interestingly, in Eggleston’s paper, the dimension was not connected to 
entropy; it was submitted earlier in the same year as Shannon’s Mathematical 
Theory of Communication was published. 
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